|
In statistics, and especially Bayesian statistics, the posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data.〔(【引用サイトリンク】url=http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mcmc_sect034.htm )〕 Described as the distribution that a new i.i.d. data point would have, given a set of ''N'' existing i.i.d. observations . In a frequentist context, this might be derived by computing the maximum likelihood estimate (or some other estimate) of the parameter(s) given the observed data, and then plugging them into the distribution function of the new observations. However, the concept of posterior predictive distribution is normally used in a Bayesian context, where it makes use of the entire posterior distribution of the parameter(s) given the observed data to yield a probability distribution over an interval rather than simply a point estimate. Specifically, it is computed by marginalising over the parameters, using the posterior distribution: : where represents the parameter(s) and the hyperparameter(s). Any of may be vectors (or equivalently, may stand for multiple parameters). Note that this is equivalent to the expected value of the distribution of the new data point, when the expectation is taken over the posterior distribution, i.e.: : (To get an intuition for this, keep in mind that expected value is a type of average. The predictive probability of seeing a particular value of a new observation will vary depending on the parameters of the distribution of the observation. In this case, we don't know the exact value of the parameters, but we have a posterior distribution over them, that specifies what we believe the parameters to be, given the data we've already seen. Logically, then, to get "the" predictive probability, we should average all of the various predictive probabilities over the different possible parameter values, weighting them according to how strongly we believe in them. This is exactly what this expected value does. Compare this to the approach in frequentist statistics, where a single estimate of the parameters, e.g. a maximum likelihood estimate, would be computed, and this value plugged in. This is equivalent to averaging over a posterior distribution with no variance, i.e. where we are completely certain of the parameter having a single value. The result is weighted too strongly towards the mode of the posterior, and takes no account of other possible values, unlike in the Bayesian approach.) ==Prior vs. posterior predictive distribution== The prior predictive distribution, in a Bayesian context, is the distribution of a data point marginalized over its prior distribution. That is, if and , then the prior predictive distribution is the corresponding distribution , where : Note that this is similar to the posterior predictive distribution except that the marginalization (or equivalently, expectation) is taken with respect to the prior distribution instead of the posterior distribution. Furthermore, if the prior distribution is a conjugate prior, then the posterior predictive distribution will belong to the same family of distributions as the prior predictive distribution. This is easy to see. If the prior distribution is conjugate, then : i.e. the posterior distribution also belongs to but simply with a different parameter instead of the original parameter Then, : Hence, the posterior predictive distribution follows the same distribution ''H'' as the prior predictive distribution, but with the posterior values of the hyperparameters substituted for the prior ones. The prior predictive distribution is in the form of a compound distribution, and in fact is often used to ''define'' a compound distribution, because of the lack of any complicating factors such as the dependence on the data and the issue of conjugacy. For example, the Student's t-distribution can be ''defined'' as the prior predictive distribution of a normal distribution with known mean ''μ'' but unknown variance ''σx2'', with a conjugate prior scaled-inverse-chi-squared distribution placed on ''σx2'', with hyperparameters ''ν'' and ''σ2''. The resulting compound distribution is indeed a non-standardized Student's t-distribution, and follows one of the two most common parameterizations of this distribution. Then, the corresponding posterior predictive distribution would again be Student's t, with the updated hyperparameters that appear in the posterior distribution also directly appearing in the posterior predictive distribution. Note in some cases that the appropriate compound distribution is defined using a different parameterization than the one that would be most natural for the predictive distributions in the current problem at hand. Often this results because the prior distribution used to define the compound distribution is different from the one used in the current problem. For example, as indicated above, the Student's t-distribution was defined in terms of a scaled-inverse-chi-squared distribution placed on the variance. However, it is more common to use an inverse gamma distribution as the conjugate prior in this situation. The two are in fact equivalent except for parameterization; hence, the Student's t-distribution can still be used for either predictive distribution, but the hyperparameters must be reparameterized before being plugged in. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Posterior predictive distribution」の詳細全文を読む スポンサード リンク
|